home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Language/OS - Multiplatform Resource Library
/
LANGUAGE OS.iso
/
smaltalk
/
st80_vw.lha
/
st80_vw
/
hashStatistics.st
< prev
next >
Wrap
Text File
|
1993-07-24
|
4KB
|
110 lines
" NAME hashStatistics.st
AUTHOR bruce@utafll.uta.edu (Bruce Samuelson)
FUNCTION measures hash efficiency of Set, Dict, subclasses
ST-VERSIONS ParcPlace VisualWorks 1.0 and ObjectWorks 4.1
PREREQUISITES none
CONFLICTS none
DISTRIBUTION world
VERSION 1.0
DATE July 8, 1993
SUMMARY These methods in the 'public domain' protocols of Set and Bag measure
how well a Set, Dictionary, IdentitySet, or IdentityDictionary is hashed.
Bruce Samuelson"!
'From VisualWorks(TM), Release 1.0 of 8 October 1992 on 8 July 1993 at 3:12:18 pm'!
!Set methodsFor: 'public domain'!
hashStatistics
"Copyright (c) 1993 by Bruce Samuelson. Released to the public domain.
This method tests how well the receiver is hashed. It works for Set and
its subclasses such as IdentitySet, Dictionary, and IdentityDictionary. It is
compatible with ParcPlace ObjectWorks 4.1 and VisualWorks 1.0. but
not ObjectWorks 4.0 because OW 4.1 changed the way hash probes are
calculated. It relies on intimate details of the findElementOrNil: and
findKeyOrNil: methods and may fail to work with ParcPlace's next release.
Return an array:
at: 1 basicSize of dictionary
at: 2 size of dictionary
at: 3 average miss of hash function
N means it took N steps to find a hole for the average element
0 means hash is ideal; large number means hash is bad
at: 4 histogram of misses
Please forgive the usage of isKindOf:. I'm trying to limit this public domain
fileout to two methods, Set>>hashStatistics and Bag>>sortedElements.
If I didn't use the kludge, I'd either have to include several versions of
hashStatistics or several support methods in scattered classes.
The first test below measures how well the Smalltalk dictionary is hashed.
In the original VW 1.0 image, its average miss is 7 steps, which is atrocious.
The worst element misses by 195 steps!! In my image, which has a few
hundred classes added, the average miss is about 1, presumably because
it got rehashed to a better layout as it grew. The second and third tests show
measurements on sets derived from the Smalltalk dictionary. The reason why
the first three tests yield different results is left as an exercise for the reader."
"Smalltalk hashStatistics"
"Smalltalk keys hashStatistics"
"(Smalltalk keys collect: [:key | key asString]) hashStatistics"
"Set new hashStatistics"
"(Set with: 1.1 with: 2.2 with: 3.3) hashStatistics"
"(IdentitySet with: #size with: #+ with: #do:) hashStatistics"
"(Dictionary with: 'How'->'adv' with: 'are'->'v' with: 'you?'->'n') hashStatistics"
"(IdentityDictionary with: 1->1 with: 2->4 with: 3->9) hashStatistics"
| enumerate hash access compare basicSize size totalMiss histogram |
(self isKindOf: Dictionary) "Embarassing: see apology above."
ifTrue:
[enumerate := #keysDo:.
access := (self isKindOf: IdentityDictionary) ifTrue: [#yourself] ifFalse: [#key]]
ifFalse:
[enumerate := #do:.
access := #yourself].
((self isKindOf: IdentityDictionary) or: [self isKindOf: IdentitySet]) "Red faced: see apology above."
ifTrue:
[hash := #identityHash.
compare := #==]
ifFalse:
[hash := #hash.
compare := #=].
basicSize := self basicSize.
size := self size.
totalMiss := 0.
histogram := Bag new: 100.
self perform: enumerate with:
[:elementOrKey |
| miss index probe |
miss := 0.
index := self initialIndexFor: (elementOrKey perform: hash) boundedBy: basicSize.
[(probe := self basicAt: index) notNil and: [(probe perform: access) perform: compare with: elementOrKey]]
whileFalse:
[miss := miss + 1.
(index := index + 1) > basicSize ifTrue: [index := 1]].
histogram add: miss.
totalMiss := totalMiss + miss].
^Array
with: basicSize
with: size
with: (totalMiss / (size max: 1)) asFloat
with: histogram sortedElements! !
!Bag methodsFor: 'public domain'!
sortedElements
"Copyright (c) 1993 by Bruce Samuelson. Released to the public domain.
Answer a collection of elements with counts, sorted by element. A version with
unoptimized implementation used to be in PPS ObjectWorks 4.0."
| tmp |
tmp := OrderedCollection new: contents basicSize.
contents associationsDo: [:assn | tmp add: assn].
^tmp asSortedCollection "faster than adding elements individually to sorted coll"! !